training curve
08f90c1a417155361a5c4b8d297e0d78-Supplemental.pdf
Now consider a perturbation of the prior distribution over transition functions ฮด: T R 0 such that R Tp ฮด(Tp)P(Tp|h0)dTp = 1. Proof: Proposition 2 directly extends Proposition 1 in [8] to BAMDPs. Therefore, the perturbed distribution over histories is also a valid probability distribution. Provided that cbo is chosen appropriately (details in the appendix), as the number of perturbations expanded approaches, a perturbation within any > 0 of the optimal perturbation will be expanded by the Bayesian optimisation procedure with probability 1 ฮด. Proof: Consider an adversary decision node, v, associated with augmented state (s,ha,y) in the BACVaR-SG. We begin by proving that Q((s,ha,y),ฮพ) is continuous with respect to ฮพ. Define a function d: S R, such that ฮพ + d produces a valid adversary perturbation.